Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
نویسندگان
چکیده
We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/ √ n) after n iterations. We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running-time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments showing that they often outperform existing approaches.
منابع مشابه
[hal-00831977, v1] Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for...
متن کاملStochastic Coordinate Descent for Nonsmooth Convex Optimization
Stochastic coordinate descent, due to its practicality and efficiency, is increasingly popular in machine learning and signal processing communities as it has proven successful in several large-scale optimization problems , such as l1 regularized regression, Support Vector Machine, to name a few. In this paper, we consider a composite problem where the nonsmoothness has a general structure that...
متن کاملFast Convergence of Stochastic Gradient Descent under a Strong Growth Condition
We consider optimizing a function smooth convex function f that is the average of a set of differentiable functions fi, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient f ′ i is bounded by a linear function of the norm of the average gradient f . We show that under these assumptions the basic stochastic gradient method with a sufficiently-small c...
متن کاملVR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning
In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make ...
متن کاملMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013